Doubly-Logarithmic Time
Learn about XFastTrie and YFastTrie searching.
XFastTrie: Searching in doubly-logarithmic Time#
The performance of the BinaryTrie structure is not very impressive. The
number of elements, n, stored in the structure is at most , so . In other words, any of the comparison-based SSet are at least as efficient as a BinaryTrie, and are not restricted to only storing integers.
Next we describe the XFastTrie, which is just a BinaryTrie with
hash tables—one for each level of the trie. These hash tables are used to
speed up the find(x) operation to time. Recall that the find(x) operation in a BinaryTrie is almost complete once we reach a node, u, where the search path for x would like to proceed to u.right (or u.left) but u has no right (respectively, left) child. At this point, the search uses u.jump to jump to a leaf, v, of the BinaryTrie and either return v or its successor in the linked list of leaves. An XFastTrie speeds up the search process by using binary search on the levels of the trie to locate the node u.
To use binary search, we need a way to determine if the node u we are looking for is above a particular level, i, of if u is at or below level i. This information is given by the highest-order i bits in the binary representation of x; these bits determine the search path that x takes from the root to level i.
Visual demonstration of the search path#
The visual demonstration of the search path is shown below:
1 of 3
2 of 3
3 of 3
For an example, in the above illustration; the last node, u, on search path for (whose binary representation is ) is the node labelled at level because there is no node labelled at level . Thus, we can label each node at level with an -bit integer.
Then, the node u we are searching for would be at or below level i if and
only if there is a node at level i whose label matches the highest-order i bits of x.
In an XFastTrie, we store, for each , all the nodes at level i in a USet, t[i], that is implemented as a hash table. Using
this USet allows us to check in constant expected time if there is a node at level i whose label matches the highest-order i bits of x. In fact, we can even find this node using t[i].find (x>>>(w − i)).
The hash tables allow us to use binary search to find u. Initially, we know that u is at some level i with . We therefore initialize l = 0 and h = w + 1 and repeatedly look at the hash table t[i], where . If contains a node whose label matches ’s
highest-order bits then we set l = i (u is at or below level i); otherwise we set h = i (u is above level i). This process terminates when , in which case we determine that u is at level l. We then complete the find(x) operation using u.jump and the doubly-linked list of leaves.
Each iteration of the while loop in the above method decreases h − l by roughly a factor of two, so this loop finds u after iterations. Each iteration performs a constant amount of work and one find(x) operation in a USet, which takes a constant expected amount of time. The remaining work takes only constant time, so the find(x) method in an XFastTrie takes only expected time.
The add(x) and remove(x) methods for an XFastTrie are almost identical to the same methods in a BinaryTrie. The only modifications are for managing the hash tables . During the add(x) operation, when a new node is created at level i, this node is added to t[i]. During a remove(x) operation, when a node is removed form level i, this node is removed from t[i]. Since adding and removing from a hash table take constant expected time, this does not increase the running times of add(x) and remove(x) by more than a constant factor. We omit a code listing for add(x) and remove(x) since the code is almost identical to the (long) code listing already provided for the same methods in a BinaryTrie.
The following theorem summarizes the performance of an XFastTrie:
Theorem 1: An XFastTrie implements the SSet interface for -bit integers. An XFastTrie supports the operations
add(x)andremove(x)in expected time per operation andfind(x)in expected time per operation.
The space used by an XFastTrie that stores n values is .
Let’s now create an instance of XFastTrie class and adds and remove values using the add() and remove() methods.
Explanation
Let’s start our explanation from the start of main() in the main.py file.
- Line 126: It creates an instance of
XFastTrie(),xfast_trie. - Line 128–133: It adds value to the
xfast_trieusing theadd()method. - Line 138–140: It finds value to the
xfast_trieusing thefind()method. It will return the value if it’s present in thexfast_trie. - Line 142–144: It removes the values from the xfast_trie using the
remove()method and update thexfast_trie. - Line 149–151: It searches for the values in
xfast_trieusing thefind()method and returns the value if it is present.
YFastTrie: a doubly-logarithmic time SSet#
The XFastTrie is a vast—even exponential—improvement over the BinaryTrie in terms of query time, but the add(x) and remove(x) operations are still not terribly fast. Furthermore, the space usage, , is higher than the other SSet implementations, which all use space. These two problems are related; if add(x) operations build a structure of size , then the add(x) operation requires at least on the order of time (and space) per operation.
The YFastTrie, discussed next, simultaneously improves the space
and speed of XFastTries. A YFastTrie uses an XFastTrie, xft, but only
stores values in xft. In this way, the total space used by xft is only . Furthermore, only one out of every add(x) or remove(x) operations in the YFastTrie results in an add(x) or remove(x) operation in xft. By doing this, the average cost incurred by calls to xft’s add(x) and remove(x)
operations is only constant.
The obvious question becomes: If xft only stores elements, where do the remaining elements go? These elements move into secondary structures, in this case an extended version of treaps. There are roughly of these secondary structures so, on average, each
of them stores items. Treaps support logarithmic time SSet operations, so the operations on these treaps will run in time, as required.
More concretely, a YFastTrie contains an XFastTrie, xft, that contains a random sample of the data, where each element appears in the sample independently with probability . For convenience, the value , is always contained in xft. Let denote the elements stored in xft. Associated with each element, , is a treap, , that stores all values in the range .
Visual demonstration ofYFastTrie#
This is illustrated below:
The find(x) operation in a YFastTrie is fairly easy. We search for x in xft and find some value xi associated with the treap . We then use the treap find(x) method on to answer the query. The entire method is a one-liner:
The first find(x) operation (on xft) takes time. The second find(x) operation (on a treap) takes time, where r is the size of the treap. Later in this section, we will show that the expected size of the treap is so that this operation takes time. This is an application of Jensen’s Inequality: If , then .
Adding an element to a YFastTrie is also fairly simple—most of the time. The add(x) method calls xft.find(x) to locate the treap, , into which x should be inserted. It then calls t.add(x) to add x to t. At this point, it tosses a biased coin that comes up as heads with probability and as tails with probability . If this coin comes up heads, then x will be added to xft.
This is where things get a little more complicated. When x is added to xft, the treap t needs to be split into two treaps, and . The treap
contains all the values less than or equal to x; is the original treap, t, with the elements of removed. Once this is done, we add the pair (x, t1) to xft. The following illustration shows an example.
The implementation of the add() method is:
Adding x to t takes time. Splitting
into and can also be done in expected time. Adding the pair (x, t1) to xft takes time, but only happens with probability . Therefore, the expected running time of the add(x) operation is
The remove(x) method undoes the work performed by add(x). We use xft to find the leaf, u, in xft that contains the answer to xft.find(x). From u, we get the treap, t, containing x and remove x from t. If x was also stored in xft (and x is not equal to ) then we remove x from xft and add the elements from ’s treap to the treap, , that is stored by ’s successor in the linked list. This is illustrated below.
The implementation of the remove() method is:
Finding the node u in xft takes expected time. Removing x from t takes expected time. Merging all the elements of into can be done in time. If necessary, removing x from xft takes time, but x is only contained in xft with probability . Therefore, the expected time to remove an element from a YFastTrie is .
Earlier in the discussion, we delayed arguing about the sizes of treaps in this structure until later. Before finishing this module, we prove the result we need.
Lemma 1. Let x be an integer stored in a YFastTrie and let denote the number of elements in the treap, t, that contains x. Then .
Proof: Refer to above illustration. Let denote the elements stored in the YFastTrie. The treap t contains some elements greater than or equal to x. These are , where is the only one of these elements in which the biased coin toss performed in the add(x) method turned up as heads. In other words, is
equal to the expected number of biased coin tosses required to obtain the
first heads.
Note: This analysis ignores the fact that never exceeds . However, this only decreases , so the upper bound still holds.
Similarly, the elements of t smaller than x are where all these coin tosses turn up as tails and the coin toss for turns up as heads. Therefore, , since this is the same coin tossing experiment considered in the preceding paragraph, but one in which the last toss is not counted. In summary, , so
Lemma 1 was the last piece in the proof of the following theorem,
which summarizes the performance of the YFastTrie:
Theorem 2: A YFastTrie implements the SSet interface for w-bit integers. A YFastTrie supports the operations add(x), remove(x), and find(x) in expected time per operation. The space used by a YFastTrie that stores n values is .
The term in the space requirement comes from the fact that xft always stores the value . The implementation could be modified (at the expense of adding some extra cases to the code) so that it is unnecessary to store this value. In this case, the space requirement in the theorem
becomes .
Let’s now create an instance of YFastTrie class and use it by adding, searching, and removing elements in a random manner.
Explanation
Let’s start our explanation from the start of main():
- Line 109: Initializing a new
Randomobject with a seed value of 0. - Line 111–114: Adding
nrandom integers to theYFastTrie, using therandIntmethod of theRandomobject to generate values between 0 and 100 timesn. - Line 117–118: Printing the values of the
xftlist, which contains the minimum values of each block in theYFastTrie. - Line 120–121: Printing the values of the
xftlist, which contains the maximum values of each block in theYFastTrie. - Line 125–127: Searching for
nrandom integers in theYFastTrie, using thefind()method to check if each value is in the tree.
Partial Integers
Quiz on Data Structures for Integers